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Infectious diseases and computer malwares spread among humans and computers through the net- 
work of contacts among them. These networks are characterized by wide connectivity fluctuations, 
connectivity correlations and the small-world property. In a previous work [A. Vazquez, Phys. Rev. 
, Lett. 96, 038702 (2006)] I have shown that the connectivity fluctuations together with the small- 

' world property lead to a novel spreading law, characterized by an initial power law growth with 

an exponent determined by the average node distance on the network. Here I extend these results 
, to consider the influence of connectivity correlations which are generally observed in real networks. 

I show that assortative and disassortative connectivity correlations enhance and diminish, respec- 
QQ ' tively, the range of validity of this spreading law. As a corollary I obtain the region of connectivity 

, fluctuations and degree correlations characterized by the absence of an epidemic threshold. These 

results are relevant for the spreading of infectious diseases, rumors, and information among humans 
' ' and the spreading of computer viruses, email worms and hoaxes among computer users. 

Ph ! PACS numbers: 89.75.Hc,05.70.Ln,87.19.Xx,87.23.Ge 

d : 

X) ! I. INTRODUCTION 

cr: 

• Halting an epidemic outbreak in its early stages requires a detailed understanding of the progression of the number 
' of new infections (incidence). Current mathematical models predict that the incidence grows exponentially during the 
J> initial phase of an epidemic outbreak Q, |^ |M ^ |^ . Within this exponential growth scenario infectious diseases are 
' characterized by the average reproductive number, giving the number of secondary infections generated by a primary 
^-H , case, and the average generation time, giving the average time elapse between the infection of a primary case and its 
■ secondary cases 0, M- In turn, vaccination strategies are designed in order to modify the reproductive number and 
' the generation time [3, H, IE Q - 

I have recently shown, however, that this picture dramatically changes when the graph underlying the spreading 
dynamics is characterized by a power law degree distribution j^, Q , where the degree of a node is defined as the 
number of its connections. The significant abundance of high degree nodes (hubs) carry as a consequence that most 
nodes are infected in a time scale of the order of the disease generation time. Furthermore, the initial incidence 
<^ : growth is no longer exponential but it follows a power law growth n(t) ~ t^~^, where D is the characteristic distance 
between nodes on the graph. Yet, these predictions are limited to uncorrelated graphs and the susceptible-infected 
(SI) model. 

In this work I extend the theory of age-dependent branching processes 0,0,^3 to consider the topological properties 
of real networks. First, I generalize my previous study 0, to include de gree correlations. This is a fundamental 
advance since real networks are characterized by degree correlations [ill Ha . \13L Il3 | that may significantly affect the 
d ' system's behavior ^1,Ej^3- Second, I consider the susceptible- infected-removed (SIR) model that provides a more 
realistic description of real epidemic outbreaks jlj, allowing us to obtain conclusions about the impact of patient 
isolation and immunization strategies on the final outbreak size. Finally, I survey our current knowledge about 
different networks underlying the spreading of infectious diseases and computer malwares and discuss the impact of 
their topology on the spreading dynamics. 
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II. POPULATION STRUCTURE 



Consider a population of N susceptible agents (humans, computers, etc) and an infectious disease (human disease, 
computer malware, etc) spreading among them. The potential disease transmission channels are represented by an 
undirected graph, where nodes represent susceptible agents and edges represent disease transmission channels. For 
example, when analyzing the spreading of sexually transmitted diseases the relevant graph is the web of sexual contacts 
[Tsf. where nodes represent sexually active individuals and edges represent sexual relationships. 
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The degree of a node is the number of edges connecting this node to other nodes (neighbors) in the graph. Given 
the finite size of the population there is a maximum degree fcmaxj where k^^x is at most iV — 1. I denote by the 
probabihty distribution that a node has degree k. The results obtained in this work are valid for arbitrary degree 
distributions. Nevertheless, recent studies have shown that several real networks are characterized by the power law 
degree distribution 



with 7 > 2 [n in mil Therefore, I focus the discussion on this particular case. 

Real networks are characterized by degree correlations between connected nodes as well. Networks representing 
technological and biological systems exhibit disassortative (negative) correlations with a tendency to have connections 
between nodes with dissimilar degrees ^101. In contrast, social networks are characterized by assortative (positive) 
degree correlations with a tendency to have connections among nodes with similar degrees |l2|. To characterize the 
degree correlations I consider the probability distribution q{k'\k) that a neighbor of a node with degree k has degree 
k' . It is important to note that the probability distributions pk and q{k'\k) are related to each other by the detailed 
balance condition HSl 



kpm{k'\k) = k'pk'q{k\k') . (2) 

Although q{k'\k) contains all the information necessary to characterize the degree correlations it is difficult to analyze. 
A more intuitive measure which often appears in the analytical calculations |l5l is the average neighbors excess 
degree [T]| 

oo 

Kk=Y.qik'\k){k'-l) . (3) 

k'=2 

The empirical data indicates that [Tl ll2ll25ll2^ 

Kk - cfc^ , (4) 
where c is obtained from the detailed balance condition resulting in 



(fc(fc - 1)) 

When the degree correlations are disassortative the nearest neighbors of a low/high degree node tend to have 
larger /smaller degree. In this case Kk decreases with increasing k. In contrast, when the degree correlations are 
assortative the nearest neighbors of a low/high degree node tend to have proportional degrees. In this case Kk in- 
creases with increasing k. Therefore, disassortative and assortative correlations are characterized hy v < and > 0, 
respectively. 

Real networks also exhibit the small- world property j23] , meaning that the average distance D between two nodes 
in the graph is small or it grows at most as log TV. For instance, social experiments such as the Kevin Bacon and 
Erdos numbers |^ or the Milgram experiment reveals that social actors are separated by a small number of 
acquaintances. This property is enhanced on graphs with a power law degree distribution Q with 2 < 7 < 3 
|30l I31I [3^ . In this case the average distance between two nodes grows as log log A^, receiving the name of ultra 
small- world 's^. 

Given the graph underlying the spreading of an infectious disease, let us consider an epidemic outbreak starting 
from a single node (the root, patient zero, or index case). In the worst case scenario the disease propagates to all 
the nodes that could be reached from the root using the graph connections. Thus, the outbreak is represented by a 
spanning or causal tree from the root to all reachable nodes. The generation of a node on the tree corresponds with 
the topological or hopping distance from the root. This picture motivates the introduction of the following branching 
process: 
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Definition II. 1. Annealed Spanning Tree (AST) with degree correlations 

Consider a graph with degree probability distribution and average degree (fc), neighbors degree distribution 
q{k'\k) given a node with degree k, detailed balance condition (O, and average distance between nodes D. The 
annealed spanning tree (AST) associated with this graph is the branching process satisfying the following properties: 

1. The process starts from a single node, the root, at generation d = 0. The root generates k sons with probability 
distribution pk ■ 

2. Each son at generation I < d < D generates k' — 1 other sons with probability distribution q{k'\k), given its 
parent node has degree k. 

3. A son at generation d — D does not generate new sons. 

The term annealed means that we are not analyzing the true (quenched) spanning tree on the graph but a branching 
process with similar statistical properties. From the mathematical point of view the AST is a generalization of 
the Galton- Watson branching process to the case where (i) the reproductive number of a node depends on the 
reproductive number of its ancestor and (ii) the process is truncated at generation D. This mathematical construction 
has been previously introduced to analyze the percolation properties of graphs with degree correlations |l2| |. 

The sharp truncation of the branching process at generation D is an approximation. In the original graph there are 
some nodes beyond the average distance between nodes D and their average degree decreases with increasing genera- 
tion. Therefore, a more realistic description is obtained defining q{k'\k) generation dependent [33.l34j and truncating 
the branching process when the number of generations equals the graph diameter. Yet, an analytical treatment of 
this more realistic model is either unfeasible or results into equations that most be solved numerically, questioning 
its advantage with respect to direct simulations on the original graph. To allow for an analytical understanding I 
truncate the branching process at generation d = D, where D represents the average distance between nodes D in 
the original graph. Furthermore, I assume that q{k'\k) is the same for all generations < d < D. At this point it is 
worth noticing that all results derived below are exact for the AST but an approximation for the original graph. 



III. SIR MODEL OF DISEASE SPREADING 



The AST describes the case where all neighbors of an infected node are infected and at the same time. More 
generally a node infects a fraction of its neighbors and these infections take place at variable times. The susceptible 
— * infected — > removed (SIR) model is an appropriate framework to consider the timing of the infection events . The 
time scales for the transitions susceptible — > infected and infected —>■ removed are characterized by the distribution 
function of infection and removal times G'i(r) and GR(r), respectively. For example, Gi(r) is the probability that the 
infection time is less or equal than r. 

Consider an infected node i and a susceptible neighbor j. The probability b{t) that j is infected by time t given i 
was infected at time zero is the combination of two factors. First, the infection time should be smaller that t and, 
second, the removal time of the ancestor i should be larger than the infection time. More precisely 

b{t)= f dGi{T)[l-Gn{T)] . (6) 
Jo 

From b{t) I obtain the probability that j gets infected no matter when 



r = hm b{t) (7) 

t~>-oo 

and the distribution function of the generation times 

G(r) = -b{r) (8) 
r 

In the original Kermack-McKendrick formulation of the SIR model [s^ the disease spreads at a rate A from infected 
to susceptible nodes and infected nodes are removed at rate fi. In this case the infection and removal rates A and /i 
are exponentially distributed, Gi(t) = 1 — e~^'^ and Gr(t) = 1 — e"^"^, resulting in 



A 

~ fJ, + \ 



(9) 
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Gsir(t) = 1 - e-(''+^)^ (10) 

Some of the results obtained in this work are vahd for any generation time distribution. Wc focus, however, on the 
SIR model with constant rate of infection and removal l|^- H10|) . 

At this point we can extend the AST definition to account for the variable infection times: 

Definition III.l. Age-dependent AST with degree correlations 

The age-dependent AST is an AST where nodes can be in two states, susceptible or infected, and 

1. An infected node (primary case) infects each of its neighbors (secondary cases) with probability r. 

2. The generation times, the times elapse from the infection of a primary case to the infection of a secondary case, 
are independent random variables with probability distribution G{t) . 

The age-dependent AST is a generalization of the Bellman-Harris T^l and Crum-Mode- Jagers 0, 0| age-dependent 
branching processes. The key new elements are the degree correlations and the truncation at a maximum generation, 
allowing us to consider the topological properties of real networks. 



IV. SPREADING DYNAMICS AND FINAL OUTBREAK SIZE 



Let I{t)dt be the average number of nodes that are infected between time t and t + dt given that patient zero (the 
root) was infected at time zero. This magnitude is known in the epidemiology literature as the incidence Consider 
an age-dependent AST and a constant infection and removal rate ij^- ljlOII . Making use of the tree structure I obtain 
(Appendix El 



[xty 



id-iy. 



-(A+p)t 



(11) 



where 



Zd 



(fc), d=l 
EZiPkk{k~l)Kt', d>l 



(12) 



is the average number of nodes at generation d, satisfying the normalization condition 



D 

l + Y,^d^N. (13) 

d=l 

The interpretation of (|ll|l is the following. If we count the time in units of one then on average new nodes are found 
at each generation. Since the infection times are variable, however, nodes at the same generation may be infected 
at different times. This contribution is taken into account by the factor between [• • ■ ] in (|ll|l . giving the probability 
density of the sum of d generation times. 

Independent of the particular d dependence of Zd, from pi|l it follows that the incidence decays exponentially for 
long times with a decay time 1/(A 4- fi). This result is a consequence of the population finite size, i.e. sooner or 
later most of the nodes are infected and the number of new infections decays. In contrast, the factor remaining after 
excluding the exponential decay is an increasing function of time and it dominates the spreading dynamics at short 
and intermediate times. I obtain the following result determining the speed of the initial growth: 

Tiieorem IV. 1. Consider the normalized incidence 



(14) 
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FIG. 1: 7 — 1/ plane showing the regions where Theorem II V. II is valid (shadowed region) for the case of a power law degree 
distribution Q and degree correlations @. The text inserts indicate the exponent 6 in 11511 . 



// there is some integer dc < D and real numbers a and b > such that for all d > dc 



(15) 



when k„ 



then 



Pit) = A 



whe 



(16) 



to 



lD-1 



(17) 



The symbol 0{tQ/t) indicates that H16|l is valid asymptotically when t ^ t^ and it represents correction terms of the 
order of to/t. The demonstration of this result is straightforward. From (I15|l it follows that for all d > dc the average 
number of nodes at generation d (I12|l is of the order of Zd ~ fcmax^'* Therefore, in the limit fcmax ^ oo the sums 
in (|ll|l and l|13() are dominated by the d — D term and corrections are given by the ratio between the d ^ D — 1 and 
d — D terms. 

The initial dynamics is characterized by a power law growth with an exponent determined by the average distance 
D. The characteristic time t^ marks the time scale when this polynomial growth starts to be manifested. This time 
is particularly small for graphs with a large maximum degree and satisfying the small world property, i.e. D is small. 
For instance, let us consider a power law degree distribution ^ with 7 > 2 and degree correlations The values 
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of 7 and v for which the condition (|15() is satisfied are given in Fig. ^ together with the exponent h. Disassortative 
degree correlations (y < 0) may invaUdate the condition l|15|l . indicating that strong disassortative correlations may 
lead to deviations from the Theorem IIV. II prediction. This observation is in agreement with a previous study focusing 
on the epidemic threshold T^. In contrast, for assortative degree correlations {v > 0) the condition l|15|) is satisfied for 
all 7 > 2. In other words, assortative correlations enhance the degree fluctuations, extending the validity of Theorem 
HVJl to the 7 > 3 region. 

Focusing on the final size of the outbreak I obtain the following corollary: 

Corollary IV. 2. Consider the average total number of infected nodes 







Ni^N I dtp{t) . (18) 
If the conditions of Theorem IIV . II are satisfied then 



Ni = N 



D 



l + O 



(A + n)to 
D ~ 1 



(19) 



From this Corollary it follows that increasing the rate of node removal, because of patient isolation or immvmization, 
we just obtain a gradual decrease on the final outbreak size. This implies that the concept of epidemic threshold loses 
sense since the outbreak size remains proportional to the population size for all removal rates. This conclusion is in 
agreement with previous studies for the case (7,1^ = 0) Hi mill and (2 < 7 < 3,:/) |ll[l3. The above Corollary 
extend these studies to the region 7 > 3, demonstrating that when i> > there is not an epidemic thresholds for any 
value of 7. 



V. DISCUSSION 



Theorem (IIV.1|I proposes a new law of spreading dynamics characterized by an initial power law growth. In essence 
the power low growth is a consequence of the small-world property and the divergence of the average reproductive 
number. Its origin is better understood analyzing the contribution of nodes at a distance d from the root. The 
distribution of infection times of nodes at generation d is given by the distribution of the sum of d generation times. 
For the case of a constant infection rate this distribution is a gamma distribution, which is characterized by an initial 
power law with exponent d — 1. This is the standard result for stochastic processes defined by a sequence of d steps 
happening at a constant rate. The total incidence is then obtained superimposing the contribution of each generation 
d, weighted by the average number of nodes at that generation. Since most nodes are found at generation d = D 
then the contribution from that generation dominates the incidence progression, resulting in a power law growth with 
exponent D ^ 1. The small- world properties simply implies that D is small and the resulting power law growth can 
be distinguished from an exponential growth. The validity of this regime is restricted to time scales that are large 
enough such that an appreciable number of nodes at generation d are infected, and it is followed by an exponential 
decay after most nodes at that generation are infected. 

To understand the relevance of this spreading law for real epidemic outbreaks, in the following I analyze the 
validity of the conditions of Theorem IIV. II for real networks underlying the spreading of human infectious diseases 
and computer malwares. 

Sexually transmitted diseases: Sexual contacts are a dominant transmission mechanism of the HIV virus causing 
AIDS. There are several reports indicating that the web of sexual contacts is characterized by a power degree distribu- 
tion. The current debate is if the exponent 7 is smaller or larger than three [H, |3^ . In either case, it is known 
that social networks are characterized by assortative degree correlations , which extends the validity of Theorem 
IIV. II to 7 > 3 (see Fig. QJ. There is also empirical evidence indicating that the number of AIDS infections grows 
as a power law in time for several populations 0, ■ When this empirical evidence is put together with that 
for sexual networks we obtain a strong indication that Theorem IIV. II provides the right explanation for the observed 
power law growth. 

Airborne diseases: Airborne diseases require contact or proximity between two individuals for their transmission. 
In this case the graph edges represent potential contact or proximity interactions among humans and the degree of 
an individual is given by the number of people with who he/she interacts within a certain period of time. Recent 
simulations of the Portland urban dynamics 43] shows that the number of people an individual contacts within a 
day follows a wide distribution up to about 10,000 contacts. A report for the 2002-2003 SARS epidemic shows a 
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wide distribution as well, in this case for the number of secondary cases generated by a primary SARS infection case. 
Although this data is not sufficient to make a definitive conclusion, it provides a clear indication that the number 
of proximity contacts a human undergo within a day is widely distributed. This observation together with the high 
degree of assortativity of social networks opens the possibility that the spread of airborne diseases within a city is 
described by Theorem lIV.il 

Computer malwares: Email worms and other computer malwares such as computer viruses and hoaxes spread 
through email communications. The email network is actually directed, i.e. the observation that user A has user 
B on his/her address book does not imply the opposite. This is an important distinction since the detailed balance 
condition ^ is valid for graphs with undirected edges. There is, however, a significant reciprocity, meaning that if 
user A has user B on his/her address book then with high probability the opposite takes place as well. Thus, in a 
first approximation we can represent email connections by undirected links or edges and, in such a case, the detailed 
balance condition ^ holds. Recent studies of the email network structure within university environments indicate 
that they are characterized by a power law degree distribution with 7 « 2 ' 21', Igj . Therefore, the spreading of 
computer malwares represents another scenario for the application of Theorcm lIV.il Further research is required to 
determine the influence of the lack of reciprocity among some email users. 

In conclusion, Theorem IIV.II characterizes the spreading dynamics on complex networks with wide connectivity 
fluctuations. Its CoroUarv IIV. 21 determines the region of connectivity fluctuations and degree correlation where there 
is not an epidemic threshold. The empirical data indicates that the Theorem conditions are satisfied for several 
networks underlying the spreading of infectious diseases among humans and computer malwares among computers. 
Therefore, I predict that Theorem IIV. II is a spreading law of modern epidemic outbreaks. 



APPENDIX A: ITERATIVE APPROACH 

Let P^'^^ (t) be the probability distribution of the number of infected nodes at time t (including those that has 
been recovered), rt, on a branch of the AST III.il given that branch is rooted at a node at generation d and its has 
degree k. In particular Pn'^\t) is the probability distribution of the total number of infected nodes at time t, given 
that patient zero (the root of the tree) became infected at time zero and its has degree k. Based on the tree structure 
we can develop an iterative approach to compute Pi'^''^'(i) recursively. 

Proposition A.l. Let i be a node at generation d with degree k and let us denote by j its neighbors on the next 
generation d + 1, where j G {1, . . . , fc} if d = 0, j G {1, . . . , /c - 1} if < d < £>, and j E {0} if d = D. Then 



C30 C30 



k fe„ 



ni— nfc— 
t 



i=l k'=2 



r / dG(T)Pif i-^-') (i - r) + J„^.o [1 - r - r[l - G(t)]] 



(Al) 



fe-l k„ 



m— nk-i—O i—lk' — l 

X r f dG{T)Pi^+^^^'\t - r) + ,5„.,o (1 - r + r[l - G{t)]) 



(A2) 



Pr''=)(t) = ^„,i 



(A3) 



Proof. Let Ui be the number of infected nodes in the branch rooted at node z, and let rij the number of infected nodes 
in the branches rooted at each of its neighbors j. Then 



rij = 1 + rij 



(A4) 



Since node i and its neighbors lie on a tree then rij are uncorrelated random variables. Furthermore, all nodes at 



generation d has the same statistical properties, i.e. rij are identically distributed random variables. Let Qn 



{d+l,k) 



it) 
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be the probability distribution of Uj, given node j is at generation d + 1 and its ancestor has degree fc > 0. With 
probabihty 1 — r the node j is not infected at any time and with probabiHty 1 — G(t) it is not yet infected at time t 
given it will be infected at some later time. Thus 



Qi''+''''\t) = 1 - r + r[l - Git)] . 



(A5) 



On the other hand, with probability r node j will be infected at some time t, with distribution function G(t), and 
continue the spreading dynamics to subsequent generations. Once node j is infected, the number of infected nodes 
in the branch rooted at node j is a random variable with probability distribution P^'^'^^''^ \t — r), given node j has 
degree fc'. Therefore, 



k' -^0 

for n > 0. From IIA4p . (|A5|) and (|A6(I we finally obtain equations IIA1(I - (|A3(I . 

Let I{t)dt be the average number of nodes that are infected between time t and t + dt (incidence), i.e. 



(A6) 



□ 



oo oo 



k=0 N=Q 

Using the recursive relations for P^f''"\t) (|Jl1) - (|X3|) we obtain the following result 
Proposition A. 2. 



rf=i 



dt 



where 



(A7) 



(A8) 



G*\t)^ dG{n) dG{T2)--- dG{Td) (A9) 
Jo Jo Jo 

is the d-order convolution of G{t), giving the probability distribution function of the sum of d generation times. 
Proof. To obtain n{t) we are going to make use of the generation function 



n=0 



Using the recursive equations IjAip - ljASp for P^'^\t) we obtain 



-, k 



1 — r + r 



[l-G{t)]+r^q{k'\k) [ dG{T)F^^^^'\x,t~T) (All) 

fc' = l 



(AlO) 



F'^'^^^\x,t)=x 



1 — r + r 



^max t 

[l-G(<)]+r V(z(fc'|fc) / dG(r)F('^+i''=')(x,t-T) 
fc' = l -^0 



fc-i 



(A12) 



(A13) 



We denote by 

Mim^t) = ^^^^-^^^ (A14) 
ox 

the mean number of infected nodes on the branch rooted at a node at layer d with degree k. Making use of the 
recursive relations (|Ill1) - (|IT3)l we obtain 



TVf (o,fe) (i) = 1 + (1 - r) / dG(r)Af(i''=) (t - t) (A15) 
Jo 

j^(d,k) {t) = l + {l-r) I dG{T)M^'^+^^^'^ {t - t) (A16) 



M(-^''=)(i) = 1 . (A17) 
Iterating the recursive relations ljA15p - ljA17|) from = D to d = we obtain 

D 

d=l ki k2 

•■• ^(?(fcrf_i|fcfc_2)(fcrf-i-l) . (A18) 

kd-l 

Note that from l)A10|l and (|A14|I it follows that 

oo 

J2 Pn'^^ = M^"^'^^ W • (A19) 

n=l 

Substituting (jAlSp into IjAlOp and the result into HA7|I we obtain (jA8|l with 

Zd = ^pfcfc^g(A:i|fc)(fci-l)^g(fc2|fci)(A:2-l) 

k ki ko, 

■ ■ ■ 'i^^'i-i\kk-2){kd-i - I) ■ (A20) 
Finally, using the detailed balance condition Q we reduce (|A20|I to p2|) . 

□ 
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